GES668: Property Assessments in Baltimore

Joshua Spokes

Overview

Are certain types of properties getting a free ride on their taxes? Project Origin

Project goals

  1. Identify tranches of property suspected to follow this pattern (Vacant, Auto-Oriented, etc.)

  2. Match these patterns to recorded sales

  3. Aggregate sales data by block group

Vacants

Changes in Scope

  • Analysis was limited to the last five years

  • Analysis limited by data quality

  • Aggregation switched from census tracts to neighborhoods

Data sources

  • Maryland CAMA/Assessment data is all public domain

  • Baltimore DHCD data licensed under Creative Commons 3.0

  • Baltimore Vacants data available under DOI DOI

Baltimore Real Property

Baltimore’s real property dataset provides additional attributes not provided at the state level like neighborhood and whether it’s unimproved property

Baltimore Vacant Building Notices

All vacant building notices issued in Baltimore back to the 1970s

Maryland Property Assessment

This data shows the property attributes of every piece of real property in Maryland, which includes location, assessment values, and some building characteristics.

Maryland CAMA Data

Building Characteristics

Core

Baltimore Neighborhoods

Analysis Approach

  • Combination of exploratory analysis and visualization

  • Relying heavily on dplyr and sf libraries for analysis

  • rdeck and tmap used for visualization

Identifying sales of vacant property

sales_valid_date |>
  left_join(dates, by = join_by(property == BLOCKLOT)) |>
  mutate(
    vacant_at_sale = (date >= DateNotice) & ((is.na(date_terminate) | date <= date_terminate))
  ) |>
  group_by(BLOCKLOT, sale) |>
  summarise(across(everything(), first),
            vacant_at_sale = any(vacant_at_sale, na.rm = TRUE)
  ) |>
  ungroup() -> vacant_sale

I was scratching my head trying to figure this out and asked ChatGPT [@ChatGPT] for help on how to approach this, which resulted in this elegant method for determining where there was a sale of vacant property.

Categorizing other sales

left_join(sales, property_land_use, by = join_by(property == BLOCKLOT)) |>
  group_by(property, transfer_no) |>
  summarise(date = first(date),
            price = first(price),
            block = first(block.x),
            property = first(property),
            acct_id_full = first(acct_id_full.x),
            vacant_at_sale = first(vacant_at_sale),
            Land_Value = first(Land_Value),
            Improvement_Value = first(Improvement_Value),
            Total_Assessment = first(Total_Assessment),
            NEIGHBOR = first(NEIGHBOR),
            BL_DSCTYPE = first(BL_DSCTYPE),
            BL_DSCSTYL = first(BL_DSCSTYL),
            CM_DSCIUSE = first(CM_DSCIUSE),
            NO_IMPRV = first(NO_IMPRV),
            .groups = "keep") |>
  mutate(identifier = case_when(
    (NO_IMPRV == "Y" & is.na(BL_DSCTYPE))    ~ "unimproved",
    vacant_at_sale                           ~ "vacant",
    str_detect(BL_DSCTYPE, "AUTO|WAREHOUSE") ~ "unperforming",
    .default = "regular"),
    price_ratio = Total_Assessment / price) |>
  ungroup() -> all_sales

Aggregation by neighborhood

all_sales |>
  group_by(NEIGHBOR, identifier) |>
  summarise(med_price_ratio = median(price_ratio),
            mean_price_ratio = mean(price_ratio),
            med_price = median(price),
            mean_price = mean(price),
            n = n(),
            .groups = "keep") |>
  pivot_wider(id_cols = "NEIGHBOR",
              names_from = "identifier",
              names_glue = "{identifier}_{.value}",
              values_from = c("med_price_ratio",
                              "mean_price_ratio",
                              "med_price",
                              "mean_price",
                              "n")) %>%
  left_join(neighborhoods, ., by = join_by(Name == NEIGHBOR)) |>
  mutate(pct_blk = Blk_AfAm / Population,
         pct_wht = White / Population) |>
  select(Name,
         Population,
         pct_blk,
         pct_wht,
         starts_with(c("vacant",
                       "unimproved",
                       "unperforming",
                       "regular"))) -> neighborhood_stats

Correlation Testing

The correlation shows a statistically significant (p-value = 0.00139) positive relationship between the median price and median sale to assessment ratio.

[1] 0.2137546

Mapping assessments

Despite what I was expecting, there is no clear pattern here.

Difference in price ratio

This map shows neighborhoods where there was at least 25 sales of vacant homes. Based on this, we can see once again that at this level, the assessments subvert hypothesis because vacant price ratios are higher than regular sales.

Challenges in working with data

  • Baltimore City and Maryland State data use separate keys (BLOCKLOT and Account ID)

  • I frequently found myself backtracking to add attributes to various intermediate datasets

  • Getting to the end and realizing that data isn’t clean enough for conclusions

Successes in working with data

I think my analysis does a good job in taking a very big dataset and distilling it to a more useable format. I aimed to make the data more tidy so that it would be easier to create comparisons

Where to learn more

All data, code, an explanation of the analysis and inspirations are available through the README on my project repository